--- title: Experience Replay keywords: fastai sidebar: home_sidebar summary: "Experience Replay is likely the simplest form of memory used by RL agents. " description: "Experience Replay is likely the simplest form of memory used by RL agents. " nb_path: "nbs/06a_memory.experience_replay.ipynb" ---
{% raw %}
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class ExperienceReplayException[source]

ExperienceReplayException() :: Exception

Common base class for all non-exit exceptions.

{% endraw %} {% raw %}

class ExperienceReplay[source]

ExperienceReplay(bs=16, max_sz=200, warmup_sz=100, memory:Optional[BD]=None)

Parameters:

  • bs : <class 'int'>, optional

    Number of entries to query from memory

  • max_sz : <class 'int'>, optional

    Maximum number of entries to hold. Will start overwriting after.

  • warmup_sz : <class 'int'>, optional

    Minimum number of entries needed to continue with a batch

  • memory : typing.Union[fastrl.core.BD, NoneType], optional
{% endraw %} {% raw %}
{% endraw %}

lets generate some batches to test with...

{% raw %}
from fastrl.data.gym import *
source=Source(
    cbs=[GymLoop(env_name='CartPole-v1',steps_delta=1,steps_count=1,seed=0),FirstLast]
)
source=Source(cbs=[GymLoop(env_name='CartPole-v1',steps_delta=1,steps_count=1,seed=0),FirstLast])
learn=fake_gym_learner(source,n=1000,bs=5)
batches=[BD(b[0]) for b in learn.dls[0]]
Could not do one pass in your dataloader, there is something wrong in it
{% endraw %} {% raw %}
experience_replay=ExperienceReplay(max_sz=20,warmup_sz=19)
test_len(experience_replay,0)
{% endraw %}

what if we fill up ER? Lets add the batches, this process will happen inplace...

{% raw %}
experience_replay+batches[0]
test_eq(experience_replay.pointer,5)
test_len(experience_replay,5)
{% endraw %}

If we add again, the total size should be 10...

{% raw %}
experience_replay+batches[1]
test_eq(experience_replay.pointer,10)
test_len(experience_replay,10)
test_eq(experience_replay.memory['step'],(batches[0]+batches[1])['step'])
{% endraw %} {% raw %}
experience_replay+batches[2]
test_len(experience_replay,15)
test_eq(experience_replay.pointer,15)
test_eq(experience_replay.memory['step'],(batches[0]+batches[1]+batches[2])['step'])
{% endraw %} {% raw %}
experience_replay+batches[3]
test_len(experience_replay,20)
test_eq(experience_replay.pointer,20)
test_eq(experience_replay.memory['step'],(batches[0]+batches[1]+batches[2]+batches[3])['step'])
{% endraw %}

Let's verify that the steps are what we expect...

What if ER is full and we add batches? We are at the maximum memory size, we expect that the next batch added should completely overwrite the first 5 entries...

{% raw %}
experience_replay+batches[4]
test_len(experience_replay,20)
test_eq(experience_replay.pointer,5)
test_eq(experience_replay.memory['step'],(batches[4]+batches[1]+batches[2]+batches[3])['step'])
{% endraw %}

This overwrite should properly overwrite the rest of the entries...

{% raw %}
experience_replay+batches[5]+batches[6]+batches[7]
test_eq(experience_replay.memory['step'],(batches[4]+batches[5]+batches[6]+batches[7])['step'])
test_eq(experience_replay.pointer,20)
{% endraw %}

so we have fully overwritten the memory twice, and so far we can prove that the memory overwritting works. Let's see what happens when we append add numbered dictionaries...

{% raw %}
experience_replay+batches[8]+batches[9]+batches[10]
test_eq(experience_replay.pointer,15)
test_eq(experience_replay.memory['step'],(batches[8]+batches[9]+batches[10]+batches[7])['step'])
{% endraw %}

What if we need to split a batch to fit at the end and beginnging of the memory? This is a possibly scary part where some of the dictionary needs to be split. Some needs to be allocated to the end of the memory, and some of it need to be allocated at the start.

{% raw %}
single_large_batch=batches[11]+batches[12]
experience_replay+single_large_batch;
{% endraw %} {% raw %}
test_eq(experience_replay.pointer,5)
test_eq(experience_replay.memory['step'],(batches[12]+batches[9]+batches[10]+batches[11])['step'])
{% endraw %}

What if we sample the experience?

{% raw %}
full_memory=(batches[12]+batches[9]+batches[10]+batches[11])
entry_ids=[str(o) for o in torch.hstack((full_memory['step'],full_memory['episode_id']))]
memory_hits=[False]*len(entry_ids)
{% endraw %}

We should be able to sample enough times that we have sampled everything. So we test this by sampling, check if that sample has been seen before, and then record that.

{% raw %}
for i in range(7):
    res,idxs=experience_replay.sample()
    for o in torch.hstack((res['step'],res['episode_id'])):
        memory_hits[entry_ids.index(str(o))]=True
test_eq(all(memory_hits),True)
{% endraw %}

What happens when ew index the experience replay?

{% raw %}
test_eq(experience_replay[5:10].memory['step'],batches[9]['step'])
{% endraw %}

What happens when we try to update the td_errors?

{% raw %}
experience_replay.update_td(TensorBatch(torch.full((5,1),1.0)),torch.arange(5,10))
{% endraw %} {% raw %}
test_eq(experience_replay.memory['td_error'].sum(),5)
test_eq(experience_replay.memory['td_error'][torch.arange(5,10)].sum(),5)
test_eq(experience_replay.memory['td_error'][torch.arange(6,11)].sum(),4)
{% endraw %} {% raw %}

class ExperienceReplayCallback[source]

ExperienceReplayCallback(bs=16, max_sz=200, warmup_sz=100, memory:Optional[BD]=None) :: Callback

Basic class handling tweaks of the training loop by changing a Learner in various events

Parameters:

  • bs : <class 'int'>, optional

  • max_sz : <class 'int'>, optional

  • warmup_sz : <class 'int'>, optional

  • memory : typing.Union[fastrl.core.BD, NoneType], optional

{% endraw %} {% raw %}
{% endraw %} {% raw %}
from fastrl.data.gym import *
source=Source(cbs=[GymLoop(env_name='CartPole-v1',steps_delta=1,steps_count=1,seed=0,mode='rgb_array'),
                   ResReduce(reduce_by=4),
                   FirstLast])
learn=fake_gym_learner(source,n=30,bs=10)
Could not do one pass in your dataloader, there is something wrong in it
{% endraw %} {% raw %}
experience_replay=ExperienceReplayCallback(bs=5,max_sz=20,warmup_sz=11)
experience_replay.learn=learn
{% endraw %} {% raw %}
for b in learn.dls[0]:
    learn.xb=b
    
    try:
        experience_replay.after_pred()
        print('memory sampled')
    except CancelBatchException:
        print('memory is not full yet!')
memory is not full yet!
memory sampled
memory sampled
{% endraw %} {% raw %}
experience_replay.experience_replay.update_td(TensorBatch(torch.rand((5,1))),torch.arange(5,10))
{% endraw %}

Memory Exploration

{% raw %}
{% endraw %} {% raw %}

get_steps[source]

get_steps(maxlen=0, atomic=False)

Parameters:

  • maxlen : <class 'int'>, optional

  • atomic : <class 'bool'>, optional

Returns:

  • typing.Tuple[typing.List[str], typing.Dict]
{% endraw %} {% raw %}
{% endraw %} {% raw %}

plot_replay_stats[source]

plot_replay_stats(experience_replay, columns='td_error', n_cols=2)

Parameters:

  • experience_replay : <class 'inspect._empty'>

  • columns : <class 'str'>, optional

  • n_cols : <class 'int'>, optional

{% endraw %} {% raw %}
{% endraw %} {% raw %}
plot_replay_stats(experience_replay,n_cols=1)
{% endraw %} {% raw %}

figures_to_html[source]

figures_to_html(figs)

Parameters:

  • figs : <class 'inspect._empty'>
{% endraw %} {% raw %}
{% endraw %} {% raw %}
figures_to_html([
    px.imshow(images,animation_frame=0),
    plot_replay_stats(experience_replay.experience_replay,n_cols=1)
])
{% endraw %}